Postgres
- Row-based database
 - Open source
 
Data Warehouse
Extensions
- Citus
 - Swarm64
 
Citus
- Columnar storage
 - Sem suporte para update e delete à data
 
Tips
- don't use the same server as your production system
 - upgrade to pg 12+ (or avoid common table expressions in your queries)
 - go easy on indexes – less is more
 - consider partitioning long tables
 - ensure you're not I/O-bound
 - vacuum analyze after bulk insertion
 - explore parallel queries
 - increase statistics sampling
 - use fewer columns on frequently-queried tables
 
Reasons not to use indexes
- For many analytics queries it's faster for Postgres to do a table scan than an index scan
 - Indexes increase the size of the table. The smaller the table, the more will fit in memory.
 - Indexes add additional cost on every insert / update
 
Types of Index
https://www.postgresql.org/docs/current/indexes-types.html
- B-tree
 - Hash
 - GiST
 - SP-GiST
 - GIN
- Used for text-search
 - Could be used for general json search
 
 - BRIN
 
Partial Indexes
Index of only some rows of a given table source.
CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
    WHERE NOT (client_ip > inet '192.168.100.0' AND client_ip < inet '192.168.100.255');
Partitions
Divide uma tabela em várias o que pode reduzir o tempo de respota se uma tiver que aceder apenas a uma query. No entanto tem um trabalho extra de manutenção (as tabelas não são criadas automaticamente).
Replication
Postgres has 2 modes:
- Physical
 - Logica (more robust)
 
MVCC
Multi-version concurrency control utilizado para garantir ACID em transações concorrentes
Explain Analyze
- Explain shows the query execution plan
 - Analyze actually runs the query
 
Options
- Analyze
 - buffers
 - Verbose
 - Settings
 - Wal: shows the wall usage on DML
 
EXPLAIN (ANALYZE, BUFFERS) /* SQL statement */;
Costs
1 unit Cost = read 8KB page during sequential scan
Startup cost: cost to return first row
Total cost: cost to return all rows
How to check?
- Finds the nodes with the most execution times
 - find the lowest node where the estimate is bigger thant the plant (usually 10x more)
 - find long running sequential scans with a filter condition. This can be improved with an index
 
Tools
Import data
- COPY is faster than INSERT
 - Increase checkpoints is better
 - Logged table are slower than unlogged tables but can crash and doesn't have replication
 - Create indexes after inserting data
 - Avoid triggers when writing
 - Improving column order and space consumption
- Fixed sized columns first
 
 
Notification/Listen
- Can one subscribe to changes? (alternative to streaming system)
 
Postgres 14
- Perf improvements
 - Adds multirange type
 
Postgres 16
- Allows for logical replication from secondary instances
 - Many performance improvements due to parallelization
 - Adds the pg_stats_io for monitoring IO